European Journal of Human Genetics — Latest Matching Preprints

1

The power of geohistorical boundaries for modeling the genetic background of human populations: the case of the rural Catalan Pyrenees

Fibla, J.; Maceda, I.; Laplana, M.; Guerrero, M.; Alvarez, M. M.; Burgueno, J.; Camps, A.; Fabrega, J.; Felisart, J.; Grane, J.; Remon, J. L.; Serra, J.; Moral, P.; Lao, O.

2022-10-31 genetics 10.1101/2022.10.28.513229 medRxiv

Top 0.1%

59.8%

Show abstract

The genetic variation of the European population at a macro-geographic scale follows genetic gradients which reflect main migration events. However, less is known about factors affecting mating choices at a micro-geographic scale. In this study we have analyzed 726,718 autosomal SNPs in 435 individuals from the Catalan Pyrenees covering around 200 km of a vast and abrupt region in the north of the Iberian Peninsula, for which we have information about the geographic origin of all grand-parents and parents. At a macro-geographic scale, our analyses recapitulate the genetic gradient observed in Spain. However, we also identified the presence of micro-population substructure among the sampled individuals. Such micro-population substructure does not correlate with geographic barriers such as the expected by the orography of the considered region, but by the bishoprics present in the covered geographic area. These results support that, on top of main human migrations, long ongoing socio-cultural factors have also shaped the genetic diversity observed at rural populations.

2

Reshaping the Hexagone: the genetic landscape of modern France

Biagini, S. A.; Carracedo, A.; Comas, D.; Calafell, F.

2019-07-29 genetics 10.1101/718098 medRxiv

Top 0.1%

44.8%

Show abstract

Unlike other European countries, the human population genetics and demographic history of Metropolitan France is surprisingly understudied. In this work, we combined newly genotyped samples from various zones in France with publicly available data and applied both allele frequency and haplotype-based methods in order to describe the internal structure of this country, by using genome-wide single nucleotide polymorphism (SNP) array genotypes. We found out that French Basques are genetically distinct from all other populations in the Hexagone and that the populations from southwest France (namely the Gascony region) share a large proportion of their ancestry with Basques. Otherwise, the genetic makeup of the French population is relatively homogeneous and mostly related to Southern and Central European groups. However, a fine-grained, haplotype-based analysis revealed that Bretons slightly separated from the rest of the groups, due mostly to gene flow from the British Isles in a time frame that coincides both historically attested Celtic population movements to this area between the 3th and the 9th centuries CE, but also with a more ancient genetic continuity between Brittany and the British Isles related to the shared drift with hunter-gatherer populations. Haplotype-based methods also unveiled subtle internal structures and connections with the surrounding modern populations, particularly in the periphery of the Hexagone.

3

ACMG Secondary Findings in the Brazilian Rare Genomes Project: Insights from 5,402 genome sequencing

Perrone, E.; Virmond, L.; Campos Coelho, A. V.; de Franca, M.; Moreno, C. A.; Prota, J. R. M.; Espolaor, J. G. d. A.; Migliavacca, M.; Tonholo Silva, T. Y.; Quaio, C. R. D. C.; Ceroni, J. R. M.; Chen, K.; Minillo, R. M.; Teixeira, A. C. B.; Yamada, R. Y.; Cintra, V. P.; de Santana, L. S.; Campilongo, G. P.; Ribeiro da Silva, R. M.; Pelegrino, K. O.; Filho, J. B. d. O.; de Almeida, T. F.

2025-01-23 genetic and genomic medicine 10.1101/2025.01.22.25320957 medRxiv

Top 0.1%

39.5%

Show abstract

PurposeSecondary findings (SF) are pathogenic or likely pathogenic variants in genes unrelated to the primary purpose of genetic testing. The American College of Medical Genetics (ACMG) provides guidelines on which SF should be reported, involving 81 genes linked to different conditions. With the increasing use of genome sequencing (GS), SF are more frequently detected, presenting challenges for healthcare systems. The Brazilian population is often underrepresented in genomic studies, which limits population-specific knowledge. ObjectiveThis study aimed to outline the profile of SF in the Brazilian Rare Genomes Project (BRGP). MethodsWe analyzed retrospectively SF (ACMG) data from GS of 5,402 BRGP individuals. ResultsOf the 5,316 cases who consented to receive SF, 3.6% (191 cases) had at least one SF. The most common genes identified were TTR, TTN, and BRCA2. SF were mainly related to cardiovascular conditions (40.2%) and cancer predisposition (37.6%). Some variants, such as TTR: c.424G>A; p. (Val142Ile) and TP53: c.1010G>A; p. (Arg337His), were recurrent, reflecting population-specific traits and founder effects. Novel variants were 10.6% of SF. ConclusionSF rate varies across studies and populations. While SF can aid early diagnosis, their relevance is debated due to potential psychological and healthcare burdens. Effective genetic counseling and public health policies are essential.

4

Medical genetics workforce in Brazil: practitioners, services, and disease distribution

Bonilla, C.; Sortica, V. A.; Schuler-Faccini, L.; Matijasevich, A.; Scheffer, M. C.

2021-10-18 genetic and genomic medicine 10.1101/2021.10.14.21265027 medRxiv

Top 0.1%

33.5%

Show abstract

PurposeIn anticipation of the implementation of personalized medicine (PM) in Brazil we assessed the demographic characteristics of its medical genetics workforce together with the distribution of rare genetic diseases (RGD) and hereditary cancer syndromes (HCS) across municipalities in the country. MethodsWe used demographic data from an earlier report on medical specialties, and open databases providing summarized data on the public and private healthcare systems, for the years 2019 and 2020. In the public system we considered RGD live births and hospitalizations, and HCS mortality. In the private system we obtained data on RGD, HCS and genetic counselling appointments. ResultsThe 332 registered medical geneticists (MGs) were mostly female, attended a public medical school, and were predominantly registered in the Southeast. The distribution of MGs overlapped the country-wise distribution of all types of genetic disease and service examined, indicating that [~]30% of the patient population has access to a MG specialist. ConclusionThe Brazilian MG workforce is concentrated in the richest and most populated areas and while it covers a significant proportion of the population there are vast regions with very limited services. The public health system should address these inequalities for a successful transition to PM.

5

Genetic consequences of serial sperm donation

Zheng, T. M.; Mejia-Garcia, A.; Bherer, C.; Laprise, C.; Laberge, A.-M.; Gravel, S.

2025-09-21 sexual and reproductive health 10.1101/2025.09.19.25336188 medRxiv

Top 0.1%

32.9%

Show abstract

Study questionHow does serial sperm donation impact genetic risk in the donor-conceived children and their descendants? Summary answerIn addition to the psychological effects of serial sperm donation, donor-conceived children are at risk of unintentional inbreeding. This risk is compounded by the hard-to-quantify effect of social proximity between mothers. Such inbreeding would cause children to have up to 15% excess risk of childhood mortality or congenital morbidities. The risk to descendants after many generations is spread across many individuals and remains low as long as the number of donor-conceived children does not increase appreciably. What is known alreadyInbreeding increases the risk for a range of diseases among the offspring, with the risk increasing with the degree of inbreeding. Sperm donation increases the risk of accidental inbreeding, and thus likely increases disease risk. Study design, size, durationWe performed a literature review of risks associated with consanguinity across a range of traits, together with a model-based mathematical analysis to estimate the short- and long-term risk associated with serial sperm donation. Participants/materials, setting, methodsWe used whole-genome sequencing and imputed sequence data from the CARTaGENE longitudinal study to estimate population prevalence of relevant risk alleles. We performed mathematical modelling based on these results on published estimates of the risk associated with inbreeding. Main results and the role of chanceWith over 600 children conceived in this serial sperm donation event, 0.1 consanguineous unions would be expected under the simplest model of random mating by generation within the province of Quebec. Preferential mating due to geographic and social proximity among the mothers could increase this rate appreciably, so that accidental inbreeding is not unlikely. Since the likelihood of inbreeding events increases quadratically with the number of children, active inbreeding avoidance by the offspring and interventions to reduce continued serial donation can reduce risk. Over generations, more distant inbreeding is unavoidable, but inbreeding coefficients are reduced. Our model predicts that the long-term excess number of serious adverse events will be fewer than one per generation. The short- and long-term rates of specific diseases may be affected, however, given public information about the donor carrier status, we expect an excess of 0.84 children per generation [95% CI: 0,3] affected by Hereditary Tyrosinemia of type 1. Large scale dataCARTaGENE is a biobank based in Quebec, Canada, that is accessible following an independent data access protocol and can be found at: https://cartagene.qc.ca/en/ Limitations, reasons for cautionOur analysis relies on uncertain estimates of the burden associated with inbreeding. We also rely on simplifying assumptions about future events, including migrations, social interactions between mothers, and future sperm donation events. As a result, our estimates should be seen as coarse estimates. Wider implications of the findingsSerial sperm donation is not uncommon. Each documented instance has raised questions about the genetic burden associated with the practice. By quantifying this risk, this study will help inform the public health and genetic counselling response to these situations, in addition to being of interest from a population genetics perspective. Study funding/competing interest(s)This research was supported by the Canadian Institute for Health Research (CIHR) project grant 437576, NSERC grant RGPIN-2017-04816, the Canada Research Chair program to S.G., and the Canada Foundation for Innovation. T.M.Z was supported by the QLS Grad and Grad Excellence Award. The authors report no competing interests. ConsanguinityThe degree of relatedness between individuals, as measured by inheritance from recent ancestors. For example, second cousins share on average 3.125% of their DNA from their great-grandparents. InbreedingThe production of offspring from individuals with high consanguinity. Runs of Homozygosity (ROH)Stretches of the genome where identical alleles were received from both parents. The fraction of the genome in ROH is a measure of inbreeding. Donor-Conceived Child (DCC)Child born following sperm donation. DCC(X) refers to a child born following sperm donation by individual X. Congenital MorbidityDiseases or medical conditions present from birth, including physical, intellectual, or developmental. Specifically, does not include any diseases or conditions that arise from exposure to medications or chemicals during gestation or infections during pregnancy.

6

Addendum to Ancient DNA data from Mengzi Ren, a Late Pleistocene individual from Southeast Asia, cannot be reliably used in population genetic analysis

Tabin, D. R.; Patterson, N. J.; Mah, M.; Reich, D. E.

2025-03-26 genetics 10.1101/2025.03.24.645126 medRxiv

Top 0.1%

28.8%

Show abstract

In addition to the issues pointed out in Tabin et al1, the MZR data from Zhang et. al 20222 are suggestive of high levels of contamination from a source similar to modern Han Chinese, the majority population in the country where MZR was sequenced. In fact, MZR can be modeled entirely as Han-related ancestry and noise. These results raise further concerns about the veracity of the MZR data and thus the papers historical conclusions.

7

What can Y-DNA analysis reveal about the surname Hay and the Hay noble lineage of Scotland?

Stead, P.; Haddrill, P. R.; Macdonald, A. F.

2025-07-15 genetics 10.1101/2025.07.09.664039 medRxiv

Top 0.1%

28.2%

Show abstract

The family name Hay (plus associated spelling variants) is a prominent Anglo-Norman-in-origin surname that has been well-documented as a Scottish noble lineage since the 12th century CE. Their historical significance, linked to the rise of the Anglo-Norman era (1093-1286 CE) in Scotland, and the historical complexities of surname adoption post-Norman conquest of England, justifies the need for a comprehensive understanding of the genetic history of the Hay noble lineage. This study focuses on examining the patterns of paternal inheritance in lineages with the Hay surname. We conducted a comprehensive analysis of Y-chromosome data that is publicly available on the Family Tree DNA (FTDNA) platform, and specific FTDNA surname projects, as well as looking in more detail at three well-documented male-line descendants of William II de la HAYA, 1st of Erroll, (d. 1201) that have been verified to a high degree of confidence. Our results reveal that all descendants of William II de la HAYA, 1st of Erroll, (d. 1201) derive from the multigenerational Y-SNPs R1a-YP6500 (plus equivalent SNPs BY33394 / FT2017) and R1a-FTT161. Furthermore, subclades of R1a-FTT161 have been identified that confirm direct male-line descent from two of William II de la HAYAs sons. Subclade R1a-BY199342 (plus equivalents) confirms direct male-line descent from David de la HAYA, 2nd of Erroll, (d. 1241), and subclade R1a-FTA7312 confirms direct male line decent from Robert de la HAYA of Erroll. The result also confirms that the Hay noble lineage shares the Y-SNP R1a-YP4138 (estimated to have occurred 832 CE) with several non-Hay testers that have surnames of Norman origin, therefore, providing further evidence to support the Norman origin hypothesis for these surnames. In addition to the identification of multigenerational Y-SNPs associated to documented Hay noblemen, this study has observed significant Y-DNA haplogroup diversity among males with the surname Hay (plus associated spelling variants: Hays, Haye, Hayes, Hey and Haya). Our results show that only 22% of the men sampled (n=109) with the surname Hay (plus associated spelling variation) are descended from the 12th century progenitor of the noble Hay lineage of Scotland. Therefore, confirming that a significant proportion of males with the surname Hay do not descend from the noble progenitor of the surname.

8

Disparities and trends in global representation of human genetics conferences: a 26-year longitudinal study of ASHG and ESHG

Zheng, H.; Wang, Y.; Tanigawa, Y.; Ong, J. S.; MacGregor, S.; Liang, L.; Kellis, M.; Han, X.

2025-08-16 genetic and genomic medicine 10.1101/2025.08.12.25333491 medRxiv

Top 0.1%

27.9%

Show abstract

Equity in human genetics research requires balanced participation not only from study participants from global populations but also from the researchers who drive the science. While disparities among research participants across ancestries and countries have been well studied, the representation and disparities of researchers themselves on the global stage remains poorly understood. Here, we analyzed over 100,000 abstracts presented at two leading annual conferences in the field, the American Society of Human Genetics (ASHG) and the European Society of Human Genetics (ESHG), from 1999 to 2024 to assess trends and geographic disparities. North America and Europe consistently dominated abstract contributions, whereas continents such as Africa, Oceania, and East Asia remained underrepresented, despite gradual increases in participation. The imbalance was even more pronounced in oral presentation: at ASHG, abstracts from North America were approximately 4 times more likely to be selected for talks than those from East Asia and 23 times more likely than those from South America; at ESHG, Europes advantage was 2 times and 9 times, respectively. Notably, Oceania had the highest relative success in oral presentation, with a ratio 5 times higher than East Asia and 29 times higher than South America in ASHG, and 8 times and 33 times higher, respectively, in ESHG. To explore potential drivers of these disparities, we examined 6 national level variables. The multivariable regression model indicated that GDP is the primary factor for abstract, while Nature Index Share is the main factor for oral presentation counts. Our findings highlight persistent global inequalities in representation of human geneticists at premier conferences. Greater international support and targeted initiatives are needed to promote more equitable worldwide involvement in human genetics.

9

Digital genetic counselling services for cascade cardiogenetic testing: a focus group study on proband, relative, and provider perspectives

van Lingen, M. N.; van Till, S. A. L.; Giesbertz, N. A. A.; Beinema, T. C.; Ausems, M. G. E. M.; Klaassen, R.; Cornel, M. C.; van den Heuvel, L. M.; van Tintelen, J. P.

2024-12-05 genetic and genomic medicine 10.1101/2024.11.27.24318108 medRxiv

Top 0.1%

26.9%

Show abstract

Digital interventions are potentially promising to improve accessibility and efficiency of genetic counselling services. However, current literature on stakeholder perspectives towards digital tools for cascade testing is limited. Therefore, this focus group study aimed to gain insights into the attitude and perspectives of probands, at-risk relatives (ARR), and genetic healthcare professionals (HCP) towards digital innovations for assistance with both pre-test and post-test counselling and cascade genetic testing in cardiogenetics. We conducted seven online focus groups, which where transcribed and thematically analysed. In total, 37 individuals participated (10 probands, 11 ARR and 16 HCP). Thematic analysis of focus group transcripts showed a first theme of (1) acceptability of digital tools. Other identified themes were defined as domains where digital tools impact traditional, in-person clinical genetic care, being: (2) family communication, (3) decision-making, (4) care relations, and (5) the genetic care system. Stakeholders expressed a predominantly positive attitude towards digitisation of (parts of) the predictive genetic counselling in cardiogenetics, under the condition that access to human contact is preserved. In the clinical setting of predictive counselling, efforts should be made to ensure access to genetic services for all ARR and to protect in-person involvement of HCP.

10

Variant curation of the largest compendium of FOXL2 coding and non-coding sequence and structural variants in BPES

Matton, C.; Van De Velde, J.; De Bruyne, M.; Van De Sompele, S.; Hooghe, S.; Syryn, H.; Bauwens, M.; D'haene, E.; Dheedene, A.; Cools, M.; Komatsuzaki, S.; Preizner-Rzucidlo, E.; Ross, A.; Armstrong, C.; Watkins, W.; Shelling, A.; Vincent, A. L.; Cassiman, C.; Vermeer, S.; Bunyan, D. J.; Verdin, H.; De Baere, E.

2026-03-02 genetic and genomic medicine 10.64898/2026.02.24.25339471 medRxiv

Top 0.1%

23.5%

Show abstract

Heterozygous FOXL2 (non-)coding sequence and structural variants (SVs) lead to blepharophimosis, ptosis and epicanthus inversus syndrome (BPES), a rare, autosomal dominant developmental disorder characterized by a completely penetrant eyelid malformation and incompletely penetrant primary ovarian insufficiency (POI). We collected variants from our in-house database, generated via clinical genetic testing and downstream research testing in the Center for Medical Genetics Ghent, Belgium (2001-2024), and via literature and other resources in the same period. All retrieved variants were categorized using ACMG/AMP classifications to increase the knowledge of pathogenicity. We collected 413 unique genetic defects of the FOXL2 region, including 76 novel variants, in 864 index patients. Of these, 87% of patients were identified with a coding FOXL2 sequence variant. The polyalanine tract is a known mutational hotspot of FOXL2, illustrated here by the high percentage of pathogenic polyalanine expansions (24%). Furthermore, the molecular spectrum in typical BPES index patients is characterized by 8% coding deletions and 3% deletions located up- and downstream of FOXL2. The remaining 2% carry translocations along with chromosomal rearrangements of 3q23. This uniform and structured reclassification, incorporating the largest dataset of variants implicated in FOXL2-associated disease so far, will improve both the diagnosis as well as genetic counselling for individuals with BPES.

11

Next generation sequencing identifies a pattern of novel germline variants in early-onset colorectal cancer

VANDE PERRE, P.; AL SAATI, A.; CABARROU, B.; PLENECASSAGNES, J.; GILHODES, J.; MONSELET, N.; LIGNON, N.; FILLERON, T.; VILLARZEL, C.; GOURDAIN, L.; SELVES, J.; MARTINEZ, M.; CHIPOULET, E.; COLLET, G.; MALLET, L.; BONNET, D.; GUIMBAUD, R.; Toulas, C.

2024-12-12 genetics 10.1101/2024.12.09.627474 medRxiv

Top 0.1%

23.4%

Show abstract

Early-onset colorectal cancer (EOCRC) incidence is increasing rapidly worldwide. However, the majority of EOCRCs are not substantiated by germline variants in the main colorectal cancer (CRC) predisposition genes (the "DIGE" panel). To investigate a potential genetic transmission of EOCRC (dominant, recessive and oligogenic hypotheses) and thus identify potentially novel EOCRC-specific predisposition genes, we conducted an analysis of 585 cancer pathway genes on an EOCRC patient cohort (n=87 patients diagnosed at [≤] 40 years of age, DIGE-) with or without a CRC family history. By comparing this germline variant spectrum to the GnomAD cancer-free database, we identified high impact variants (HVs) in 15 genes significantly over-represented in the EOCRC cohort. Among the 32 unrelated patients with a CRC family history (i.e. with a potentially dominant transmission pattern), nine presented HVs in ten of the genes tested, four of these genes had a DNA repair function. A potentially recessive transmission of EOCRC in patients without a CRC family history cannot be supported by our results nor can an oligogenic transmission. We subsequently sequenced these 15 genes in a cohort of 82 late-onset CRCs (cancer diagnosis [≥]50 years, DIGE-) and found variants in 11 of these genes to be specific to EOCRC. To evaluate whether variants in these 11 genes would allow to specifically detect EOCRC patients, we screened our patient database (n=6482), which only contained 2% of EOCRCs (DIGE-), and identified two other EOCRC cases diagnosed after the constitution of our cohort, with individual HVs in RECQL4 and NUTM1. Altogether, we showed that 37.5% and 18.75% of heterozygous NUTM1 and RECQL4 HVs of our database were diagnosed with EOCRC. Our work has identified a pattern of germline gene variants not previously associated with EOCRC. This paves the way to addressing the contribution of these variants to EOCRC risk and oncogenesis. Author SummaryEarly-onset colorectal cancer (diagnosed at [≤] 40 years of age) is a rare disease that can in part be explained by a hereditary genetic predisposition. To identify novel gene variants potentially associated with EOCRC risk, we analysed a panel of 585 genes in 87 patients with early-onset colorectal cancer unexplained by conventional genetic tests. This first analysis highlighted 15 genes of interest. To evaluate if this genetic profile is specific to early onset, we sequenced these 15 genes in a population of late-onset colorectal cancers (diagnosed after 50 years of age). Variants in 11 of these genes were specific to the early-onset population. To assess if this genetic pattern allows to identify other early-onset cases, we screened these genes in our whole database of 6482 patients and identified two new early-onset cases. Our results need to be confirmed, and validated in larger cohorts but pave the way for future research into early-onset colorectal cancer and the possibility of improving screening or treatment options for these patients and their family members.

12

COVID-19 risk haplogroups differ between populations, deviate from Neanderthal haplotypes and compromise risk assessment in non-Europeans

Wohlers, I.; Calonga-Solis, V.; Jobst, J.-N.; Busch, H.

2020-11-03 genetics 10.1101/2020.11.02.365551 medRxiv

Top 0.1%

23.0%

Show abstract

Recent genome wide association studies (GWAS) have identified genetic risk factors for developing severe COVID-19 symptoms. The first published study reported a 1bp insertion rs11385942 on chromosome 3 (1) and subsequent studies single nucleotide variants (SNVs) such as rs35044562, rs67959919 (2) and rs13078854 (3), all highly correlated with each other. Zeberg and Paabo (4) subsequently traced them back to Neanderthal origin. They found that a 49.4 kb genomic region including the risk allele of rs35044562 is inherited from Neanderthals of Vindija in Croatia. Here we add a differently focused evaluation of this major genetic risk factor to these recent analyses. We show that (i) COVID-19-related genetic factors of three previously assessed Neanderthals deviate from those of modern humans and that (ii) they differ among world-wide human populations, which compromises risk prediction in non-Europeans. Currently, caution is thus advised in the genetic risk assessment of non-Europeans during this world-wide COVID-19 pandemic.

13

Rare diseases load through the study of a regional population

Michel, E.; Moreau, C.; Gagnon, L.; Leblanc, J.; Tardif, J.; Girard, L.; Mathieu, J.; Gagnon, C.; Desmeules, M.; Brisson, J.-D.; Bouchard, L.; Girard, S. L.

2024-10-30 genetic and genomic medicine 10.1101/2024.10.29.24316346 medRxiv

Top 0.1%

22.8%

Show abstract

Rare genetic diseases impact many people worldwide and are challenging to diagnose. In this study, we introduce a novel regional population cohort approach to identify pathogenic variants that occur more frequently within specific populations and are of clinical interest. We utilized a cohort from Quebec, including the Saguenay-Lac-Saint-Jean region, which is known for its founder effect and higher frequency of certain pathogenic variants. By analyzing both the frequency of these variants and their origin through shared identical-by-descent segments, we validated 38 variants previously reported as being more common due to the founder effect. Additionally, we identified 42 unreported founder variants in Quebec or the Saguenay-Lac-Saint-Jean, some with carrier rates estimates as high as 1/22. We also observed a greater deleterious mutational load for the studied variants in individuals from the Saguenay-Lac-Saint-Jean compared to other urban Quebec regions. These findings were brought to the clinic where 12 pathogenic variants were detected in patients, including 3 that are responsible for very severe diseases and could be considered for inclusion in a carrier test for the Saguenay-Lac-Saint-Jean population. This study highlights the potential underestimation of rare disease prevalence and presents a population-based approach that could aid clinicians in their diagnostic efforts and patients management.

14

The Norwegian Mother, Father, and Child cohort study (MoBa) genotyping data resource: MoBaPsychGen pipeline v.1

Corfield, E. C.; Frei, O.; Shadrin, A. A.; Rahman, Z.; Lin, A.; Athanasiu, L.; Cevdet Akdeniz, B.; Hannigan, L.; Wootton, R. E.; Austerberry, C.; Hughes, A.; Tesli, M.; Westlye, L. T.; Stefansson, H.; Stefansson, K.; Njolstad, P. R.; Magnus, P.; Davies, N. M.; Appadurai, V.; Hemani, G.; Hovig, E.; Zayats, T.; Ask, H.; Reichborn-Kjennerud, T.; Andreassen, O. A.; Havdahl, A.

2022-06-26 genetics 10.1101/2022.06.23.496289 medRxiv

Top 0.1%

22.7%

Show abstract

BackgroundThe Norwegian Mother, Father, and Child Cohort Study (MoBa) is a population-based pregnancy cohort, which includes approximately 114,500 children, 95,200 mothers, and 75,200 fathers. Genotyping of MoBa has been conducted through multiple research projects, spanning several years; using varying selection criteria, genotyping arrays, and genotyping centres. MoBa contains numerous interrelated families, which necessitated the implementation of a family-based quality control (QC) pipeline that verifies and accounts for diverse types of relatedness. MethodsThe MoBaPsychGen pipeline, comprising pre-imputation QC, phasing, imputation, and post-imputation QC, was developed based on current best-practice protocols and implemented to account for the complex structure of the MoBa genotype data. The pipeline includes QC on both single nucleotide polymorphism (SNP) and individual level. Phasing and imputation were performed using the publicly available Haplotype Reference Consortium release 1.1 panel as a reference. Information from the Medical Birth Registry of Norway and MoBa questionnaires were used to identify biological sex, year of birth, reported parent-offspring (PO) relationships, and multiple births (only available in the offspring generation). ResultsIn total, 207,569 unique individuals (90% of the unique individuals included in the study) and 6,981,748 autosomal SNPs passed the MoBaPsychGen pipeline. A further 174,462 chromosome X and 3,200 PAR SNPs are available in a subset of these individuals (N = 204,913 and 135,593, respectively). The relatedness checks performed throughout the pipeline allowed identification of within-generation and across-generation first-degree, second-degree, and third-degree relatives. The individuals passing post-imputation QC comprised 64,471 families ranging in size from singletons to 84 unique individuals (singletons are included as families as other family members may not have been genotyped, imputed, or passed post-imputation QC). The relationships identified include 287 monozygotic twin pairs, 22,884 full siblings, 117,004 PO pairs, 23,299 second-degree relative pairs, and 10,828 third-degree relative pairs. DiscussionMoBa contains a highly complex relatedness structure, with a variety of family structures including singletons, PO duos, full (mother, father, child) PO trios, nuclear families, blended families, and extended families. The availability of robustly quality-controlled genetic data for such a large cohort with a unique extended family structure will allow many novel research questions to be addressed. Furthermore, the MoBaPsychGen pipeline has potential utility in similar cohorts.

15

Non-random Mating Patterns in Education, Mental, and Somatic Health: A Population Study on Within- and Cross-Trait Associations

Torvik, F. A.; Sunde, H. F.; Cheesman, R.; Eftedal, N. H.; Keller, M. C.; Ystrom, E.; Eilertsen, E. M.

2023-11-27 psychiatry and clinical psychology 10.1101/2023.11.27.23299055 medRxiv

Top 0.1%

22.6%

Show abstract

Partners resemble each other on many traits, such as health and education. The traits are usually studied one by one in data from established couples and with potential participation bias. We studied all Norwegian parents who had their first child between 2016 and 2020 (N=187,926) and the siblings of these parents. We analysed grade point averages at age 16 (GPA), educational attainment (EA), and medical records with diagnostic data on 10 mental and 10 somatic health conditions measured 10 to 5 years before childbirth. We found stronger partner similarity in mental (median r=0.14) than in somatic health conditions (median r=0.04), with ubiquitous cross-trait correlations for mental health conditions (median r=0.13). GPA correlated 0.43 and EA 0.47 between partners. High GPA or EA was associated with better mental (median r=-0.16) and somatic (median r=-0.08) health in partners. Elevated correlations for mental health (median r=0.25) in established couples indicated convergence. Analyses of siblings and in-laws revealed deviations from direct assortment, suggesting instead indirect assortment based on related traits. Adjusting for GPA and EA reduced partner correlations in health with 30-40%. This has implications for the distribution of risk factors among children, for genetic studies, and for studies of intergenerational transmission.

16

Fine-scale structure of a whole regional population through genetics and genealogies

Morin, G.-P.; Moreau, C.; Barry, A.; Girard, S. L.

2025-10-06 genetics 10.1101/2025.09.23.678060 medRxiv

Top 0.1%

22.6%

Show abstract

The Saguenay-Lac-Saint-Jean (SLSJ) region of Quebec, Canada, a population shaped by a prominent founder effect, has long been considered genetically homogeneous. This study comprehensively investigates the fine-scale population structure within SLSJ by integrating genotype data from the CARTaGENE cohort with extensive genealogical records from the BALSAC population register. A time-efficient algorithm was developed to compute billions of kinship coefficients from genealogies in order to analyse an entire generation. We demonstrate a striking concordance between realised (genetic) and expected (genealogical) kinship (r = 0.78). From both kinship measures, we reveal fine-scale population structure at the municipal level within SLSJ, challenging the notion of regional homogeneity. Our analysis highlights an east-west genetic gradient and uncovers migratory streams and differential founders genetic contributions that shaped the genetic landscape of this population. This research provides insights into the interplay of genetics, demography, and historical events, underscoring the importance of fine-scale population structure in genetic studies and reaffirming the power of large-scale genealogical data.

17

Using the ancestral recombination graph to study the history of rare variants in founder populations

Mejia-Garcia, A.; Diaz-Papkovich, A.; Sillon, G.; D'Agostino, D.; Chong, A.-L.; Chong, G.; Lo, K. S.; Baret, L.; Hamel, N.; Chapdelaine, V.; Foulkes, W. D.; Taliun, D.; Shapiro, A. J.; Lettre, G.; Gravel, S.

2025-03-13 genetics 10.1101/2025.03.13.643149 medRxiv

Top 0.1%

22.1%

Show abstract

Gene genealogies represent the ancestry of a sample and are often encoded as ancestral recombination graphs (ARG). It has recently become possible to infer these gene genealogies from sequencing or genotyping data and use them for evolutionary and statistical genetics. Unfortunately, inferred gene genealogies can be noisy and subject to biases, making their applications more challenging. This project aims to study the application of ARG methods to systematically impute and trace the transmission of all disease variants in founder populations where long-shared haplotypes allow for accurate timing of relatedness. We applied these methods to the population of Quebec, where multiple founder events led to an uneven distribution of pathogenic variants across regions and where extensive population pedigrees are available. We validated our approach with nine founder mutations for the SLSJ region, demonstrating high accuracy for mutation age, imputation, and regional frequency estimation. Moreover, we showed that this subset of high-quality carriers is sufficient to capture previously described associations with pathogenic variants in the LPL gene. This method systematically characterizes rare variants in founder populations, establishing a fast and accurate approach to inform genetic screening programs.

18

Clinical genetics and its adjacent regimes

Lange, T. Z.; Rigter, T.; Vrijenhoek, T.

2020-06-05 genetic and genomic medicine 10.1101/2020.06.04.20102939 medRxiv

Top 0.1%

20.2%

Show abstract

Clinical genetics is the prime application of genetics in healthcare, providing highly advanced and reliable diagnostics for patients with (mostly rare) disease of genetic origin. Whereas many novel technologies have expanded the genetic toolkit, integration or alignment with other areas of healthcare is often challenging. We hypothesise that this is due to the characteristics inherent to the regimes in which the genetic technologies were to be implemented. In order to facilitate integration of genetic applications in a rebooting and perhaps transforming healthcare system, we here provide insights in discrepancies between clinical genetics and four of its adjacent regimes; public health, human genetic research, non-genetic healthcare, and society. We conducted twelve semi-structured group interviews and a focus group to collect information on overlapping and distinctive elements of each regime. We identified three aspects in which the adjacent regimes differed considerably compared to clinical genetics; perception of data, expectations from technologies, and compartimentalisation units. Strikingly, divergence within each of these aspects was determined by elements of culture, and not - as is often thought - by elements of structure, e.g. regulation and policy. We conclude that implementation of genetics requires transdisciplinary empathy - understanding of the way of organizing, thinking and doing in adjacent regimes.

19

Precision Colorectal Cancer Screening with Polygenic Risk Score

Tasa, T.; Puustusmaa, M.; Tonisson, N.; Kolk, B.; Padrik, P.

2020-08-22 genetic and genomic medicine 10.1101/2020.08.19.20177931 medRxiv

Top 0.1%

18.7%

Show abstract

Colorectal cancer (CRC) is the second most common cancer in women and third most common cancer in men. Genome-wide association studies have identified numerous genetic variants (SNPs) independently associated with CRC. The effects of such SNPs can be combined into a single polygenic risk score (PRS). Stratification of individuals according to PRS could be introduced to primary and secondary prevention. Our aim was to combine risk stratification of a sex-specific PRS model with recommendations for individualized CRC screening. Previously published PRS models for predicting the risk of CRC were collected from the literature. These were validated on the UK Biobank (UKBB) consisting of a total of 458 696 quality-controlled genotypes with 1810 and 1348 prevalent male cases, and 2410 and 1810 incident male and female cases. The best performing sex-specific model was selected based on the AUC in prevalent data and independently validated in the incident dataset. Using Estonian CRC background information, we performed absolute risk simulations and examined the ability of PRS in risk stratifying individual screening recommendations. The best-performing model included 91 SNPs. The C-index of the best performing model in the dataset was 0.613 (SE = 0.007) and hazard ratio (HR) per unit of PRS was 1.53 (1.47 - 1.59) for males. Respective metrics for females were 0.617 (SE = 0.006) and 1.50 (1.44 - 1.58). PRS risk simulations showed that a genetically average 50-year-old female doubles her risk by age 58 (55 in males) and triples it by age 63 (59 in males). In addition, the best performing PRS model was able to identify individuals in one of seven groups proposed by Naber et al. for different coloscopy screening recommendation regimens. We have combined PRS-based recommendations for individual screening attendance. Our approach is easily adaptable to other nationalities by using population-specific background data of other genetically similar populations.

20

Reconstructing the genetic history of Kra-Dai speakers from Thailand

Changmai, P.; Koci, J.; Flegontov, P.

2022-07-02 genetics 10.1101/2022.06.30.498332 medRxiv

Top 0.1%

18.5%

Show abstract

Genetic history of the Thai people and, more generally, speakers of the Kra-Dai languages (also known as Tai-Kadai languages) in Thailand remains a topic of debate. Recently, Kutanan et al.1 analyzed genome-wide genetic data for dozens of present-day human populations from Thailand and surrounding countries and concluded that the Central Thai, Southern Thai, and Malay from Southern Thailand are genetically continuous with Austroasiatic speakers such as Mon, and thus the advent of Kra-Dai and Austronesian languages to Central and Southern Thailand was overwhelmingly a result of cultural rather than genetic diffusion. We re-analyzed the genetic data reported by Kutanan et al.1 using an advanced technique for inferring admixture graph models, using autosomal haplotypes, and other methods. We did not reproduce the results by Kutanan et al.1, and our analyses revealed a more complex picture of the genetic history of Kra-Dai speakers and other populations of Thailand.